Hybrid Index Structures for Temporal-Textual Web Search
نویسندگان
چکیده
Most Web pages contain temporal information. However, most of previous studies only consider the update time of Web pages rather than fully exploit different temporal features in Web. In this paper, we propose a novel approach to fusing different temporal features in Web pages to build an efficient index structure for temporal-textual Web search. Specially, we focus on update time and content time, and propose to use a hybrid index structure to organize textual keywords, update time, and content time. In particular, we study three mechanisms to implement a hybrid index structure for temporal-textual Web search: (1) first inverted file then MAP21-tree and B+-tree, (2) first inverted file then MAP21-tree, (3) expanded inverted file. We conduct experiments on a real dataset to evaluate the performance of those hybrid index structures. The experimental results show that the first inverted file then MAP21-tree index structure has the best query performance.
منابع مشابه
Indexing temporal information for web pages
Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In th...
متن کاملTemporal-Textual Retrieval: Time and Keyword Search in Web Documents
As the web ages, many web documents become relevant only to certain time periods, such as web-pages containing news and events or those documenting natural phenomena. Hence, to retrieve the most relevant pages, in addition to providing the relevant keywords, one may desire to identify the relevant time period(s) as well, e.g., “Barack Obama 1980-1985”. Unfortunately, not much work has been done...
متن کاملHybrid Indexing and Seamless Ranking of Spatial and Textual Features of Web Documents
There is a significant commercial and research interest in locationbased web search engines. Given a number of search keywords and one or more locations that a user is interested in, a location-based web search retrieves and ranks the most textually and spatially relevant web pages. In this type of search, both the spatial and textual information should be indexed. Currently, no efficient index...
متن کاملSemplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data
As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries ...
متن کاملLightweight integration of IR and DB for scalable hybrid search with integrated ranking support
The Web contains a large amount of documents and an increasing quantity of structured data in the form of RDF triples. Many of these triples are annotations associated with documents. While structured queries constitute the principal means to retrieve structured data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these for...
متن کامل